Web Crawlers : Taxonomy , Issues & Challenges
نویسندگان
چکیده
with increase in the size of Web, the search engine relies on Web Crawlers to build and maintain the index of billions of pages for efficient searching. The creation and maintenance of Web indices is done by Web crawlers, the crawlers recursively traverses and downloads Web pages on behalf of search engines. The exponential growth of Web poses many challenges for crawlers.This paper makes an attempt to classify all the existing crawlers on certain parameters and also identifies the various challenges to web crawlers. Keywords— WWW, URL, Mobile Crawler, Mobile Agents, Web Crawler.
منابع مشابه
A brief history of web crawlers
Web crawlers have a long and interesting history. Early web crawlers collected statistics about the web. In addition to collecting statistics about the web and indexing the applications for search engines, modern crawlers can be used to perform accessibility and vulnerability checks on the application. Quick expansion of the web, and the complexity added to web applications have made the proces...
متن کاملWeb Crawler: Extracting the Web Data
Internet usage has increased a lot in recent times. Users can find their resources by using different hypertext links. This usage of Internet has led to the invention of web crawlers. Web crawlers are full text search engines which assist users in navigating the web. These web crawlers can also be used in further research activities. For e.g. the crawled data can be used to find missing links, ...
متن کاملOptimization Issues in Web Search Engines
Crawlers are deployed by a Web search engine for collecting information from different Web servers in order to maintain the currency of its data base of Web pages. We present studies on the optimization of Web search engines from different perspectives. We first investigate the number of crawlers to be used by a search engine so as to maximize the currency of the data base without putting an un...
متن کاملWebParF: A Web partitioning framework for Parallel Crawlers
With the ever proliferating size and scale of the WWW [1], efficient ways of exploring content are of increasing importance. How can we efficiently retrieve information from it through crawling? And in this “era of tera” and multi-core processors, we ought to think of multi-threaded processes as a serving solution. So, even better how can we improve the crawling performance by using parallel cr...
متن کاملImproving the performance of focused web crawlers
This work addresses issues related to the design and implementation of focused crawlers. Several variants of state-of-the-art crawlers relying on web page content and link information for estimating the relevance of web pages to a given topic are proposed. Particular emphasis is given to crawlers capable of learning not only the content of relevant pages (as classic crawlers do) but also paths ...
متن کامل